Skip to content

Auth Retry Release#1006

Merged
joaodordio merged 5 commits intomasterfrom
autoretry-master
Mar 23, 2026
Merged

Auth Retry Release#1006
joaodordio merged 5 commits intomasterfrom
autoretry-master

Conversation

@franco-zalamena-iterable
Copy link
Contributor

✏️ Description

Following a successful bugbash, we now have to merge the Auto retry into master.

  • Implement auto-retry logic for JWT auth failures in offline request
    processing (SDK-342). When a queued task gets a 401 JWT error and autoRetry is
    enabled via remote config, the task is retained in the DB and processing
    pauses until a valid token is obtained via an AuthTokenReadyListener callback.
  • Add endpoint classification so unauthenticated endpoints (e.g.,
    disableDevice, mergeUser) continue processing during an auth pause while
    JWT-required tasks wait.
  • Add auth state machine (VALID/INVALID/UNKNOWN) to IterableAuthManager to
    coordinate pause/resume between IterableRequestTask, IterableTaskRunner, and
    the auth refresh flow.

Co-authored-by: Franco Zalamena <franco.zalamena@iterable.com>
Co-authored-by: “Akshay <“ayyanchira.akshay@gmail.com”>
Copy link
Member

@joaodordio joaodordio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the reset() comment all looks good to me.

Claude came up with this warning which I thought was worth mentioning. Not tied to this branch but worth noting.

IterableResponse uses Map.of() which crashes on API 21-29.

Map.of() is a Java 9 API, available on Android only from API 30+. No coreLibraryDesugaring is configured in build.gradle. With minSdkVersion 21, this will throw NoSuchMethodError at runtime on Android 5–10 devices when setEmail/setUserId completes login.

Marking as CR so we can better understand the risk of it!

Copy link
Member

@joaodordio joaodordio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Collaborator

@sumeruchat sumeruchat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#: 1
Bug: Callbacks dropped on retry
Summary: onTaskCompleted unconditionally removes callbacks from the maps on
RETRY. When the task eventually succeeds, callbacks are gone. Also fires
the
failure callback prematurely on 401 before retry.
────────────────────────────────────────

#: 4
Bug: Negative exponent in getNextRetryInterval
Summary: In the primary retry path, retryCount is incremented to 1 before
getNextRetryInterval is called, so it's fine. But in edge paths like
queueExpirationRefresh's catch block or checkAndHandleAuthRefresh,
retryCount can still be 0, giving exponent -1 and halving the interval.
────────────────────────────────────────
#: 5
Bug: Thread-unsafe authTokenReadyListeners
Summary: Plain ArrayList with no synchronization, accessed from multiple
threads. The snapshot copy in notifyAuthTokenReadyListeners doesn't protect

against concurrent modification during copy construction.

#6
2. setAuthState has a TOCTOU race (IterableAuthManager.java:109-114). The
read of this.authState and subsequent write are not atomic — two threads
could both read INVALID, both transition, and both fire notifications.
Consider making setAuthState synchronized.

#: 7
Bug: updateFromRemoteConfig dead code
Summary: Never called from production code. The remote config parsing only
reads KEY_OFFLINE_MODE and KEY_AUTO_RETRY, never endpoint classification.
The method exists and is tested in isolation but never wired up.

@franco-zalamena-iterable
Copy link
Contributor Author

Bug: Callbacks dropped on retry
Summary: onTaskCompleted unconditionally removes callbacks from the maps on
RETRY. When the task eventually succeeds, callbacks are gone. Also fires
the
failure callback prematurely on 401 before retry.

What is the solution here? there can only be 1 callback at a time, if multiple calls fail we would override them. Also if there is a 401 there is a failure, independently if it retries or not. For example a tracking should not have a callback waiting for it for something to happen, if it tracked it does not really matter when

@franco-zalamena-iterable
Copy link
Contributor Author

#: 4
Bug: Negative exponent in getNextRetryInterval
Summary: In the primary retry path, retryCount is incremented to 1 before
getNextRetryInterval is called, so it's fine. But in edge paths like
queueExpirationRefresh's catch block or checkAndHandleAuthRefresh,
retryCount can still be 0, giving exponent -1 and halving the interval.

This is the behaviour we always had, i think it is intended, we can cut a ticket for a fix if you think it is necessary

@franco-zalamena-iterable
Copy link
Contributor Author

#: 5
Bug: Thread-unsafe authTokenReadyListeners
Summary: Plain ArrayList with no synchronization, accessed from multiple
threads. The snapshot copy in notifyAuthTokenReadyListeners doesn't protect
against concurrent modification during copy construction.

This should not really happen anywhere since we only have 1 call place, but i can make the fix as it is a simple one

@franco-zalamena-iterable
Copy link
Contributor Author

#: 7
Bug: updateFromRemoteConfig dead code
Summary: Never called from production code. The remote config parsing only
reads KEY_OFFLINE_MODE and KEY_AUTO_RETRY, never endpoint classification.
The method exists and is tested in isolation but never wired up.

This is intentional, i can rename the function but it is just a placeholder if we get this list to be dynamic at some point

@franco-zalamena-iterable
Copy link
Contributor Author

franco-zalamena-iterable commented Mar 23, 2026

#: 6
2. setAuthState has a TOCTOU race (IterableAuthManager.java:109-114). The
read of this.authState and subsequent write are not atomic — two threads
could both read INVALID, both transition, and both fire notifications.
Consider making setAuthState synchronized.

The duplicate notification scenario requires two threads to both read INVALID
simultaneously and both transition to a non-INVALID state. In practice:

  • handleAuthTokenSuccess sets UNKNOWN (token obtained)
  • setIsLastAuthTokenValid(true) sets VALID (request succeeded)

These happen sequentially — you get a token first, then make a request, then
it succeeds. They can't race because the request can't succeed before the
token is obtained.

The only realistic "race" would be two simultaneous handleAuthTokenSuccess
calls, but requestNewAuthToken is already synchronized and has the pendingAuth
guard preventing concurrent token requests.

Worst case if it did happen: onAuthTokenReady() fires twice, which calls
runNow() twice, which is idempotent (it just posts a message to the handler,
and processTasks() handles an empty queue gracefully).

Copy link
Collaborator

@sumeruchat sumeruchat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joaodordio joaodordio merged commit 7da40fc into master Mar 23, 2026
8 of 11 checks passed
@joaodordio joaodordio deleted the autoretry-master branch March 23, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants